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The Mosaic Genome of 
Warm-Blooded Vertebrates 

Giorgio Bemardi, Birgitta Olofsson, Jan Filipski 
Marino Zerial, Julio Salinas, Gerard Cuny 
Michele Meunier-Rotival, Francis Rodier 



Density gradient centrifugation in the 
presence of certain DNA ligands — such 
as silver ion, Ag^; BAMD [3,6-bis-(ace- 
tatomercurimethyl)dioxane] (1-3) — re- 
sults in the separation of nuclear DNA 
from warm-blooded vertebrates into four 
m^or components and several satellite 



and minor (such as ribosomal DNA) 
components (4-9). The former include: 
(i) two light components, LI and L2, 
poorly or not resolved in some genomes 
(J); and (ii) two or three heavy compo- 
nents, HI, H2, H3. Figure 1 shows the 
relative amounts and the buoyant densi- 
ties of the major components of the 
chicken, mouse, and human genomes (8, 
9). The heavy components account for 
the strong heterogeneity and marked 
asymmetry of main-band DNA's from 
warm-blooded vertebrates (4-8). In con- 
trast, main-band DNA's from most cold- 
blooded vertebrates show (Fig. 1) weak 
heterogeneities, only slightly skewed 
CsCl peaks, and major components that 
have buoyant densities which are only or 
mainly in the same range as the light 
components of warm-blooded verte- 
brates (5, 10, 11). The families of mole- 
cules forming the major components are 
derived, by the unavoidable breakage 
which accompanies DNA preparation 
from much longer DNA segments, the 



isochores (8, 12) (Fig. 2), which have an 
average size well above 200 kilobases 
(kb) (6, 7), and are fairly homogeneous in 
base composition (6, 8, 13-16). 

Here we have studied (i) the distribu- 
tion of several genes, of some families of 
interspersed repeats, and of some inte- 



grated viral sequences in the major com- 
ponents of genomes from warm-blooded 
vertebrates; and (ii) the correlation be- 
tween this distribution and the base com- 
position and codon usage of these se- 
quences {17-21). 



Distribution of Genes, Interspersed 
Repeats, and Integrated Viral Sequences 

The sequences investigated and the 
major components in which they were 
found are shown in Table 1. The main 
findings (described below), concern 
three issues: (i) some properties of iso- 
chores, as judged from the localization of 
specific sequences; (ii) the relation be- 
tween isochores and chromosomes; and 
(iii) the genomic distribution of the se- 
quences investigated. 

Single-copy genes are located in single 
m^or components (Fig. 3). This indi- 
cates that the separation of major com- 
ponents corresponds to a real fraction- 



ation of the genome; and that large seg- 
ments around the genes tested are com- 
positionally fairly homogeneous. Indeed, 
if either point were incorrect, a given 
single-copy gene would be found in more 
than one component. The same conclu- 
sions were drawn earlier as a result of 
different experimental approaches (4-9, 
14-16). The only exception to these con- 
clusions is the c-myc gene which seems 
to be located at an H1-H2 border. 

Clustered genes are located in the 
same major component (Fig. 3), as ex- 
pected if isochore size is large compared 
to gene cluster size, from 4 to 40 kb in 
the cases under consideration (Table 1). 
In contrast, scattered genes belonging to 
the same family may be located in differ- 
ent m^or components. For instance, the 
actin genes and pseudogenes are scat- 
tered over all DNA components (22). 

Genes present in a given major compo- 
nent may be lo oted on d iffer ent chrom o- 
somes^ In chicken, a^- and a"-globin 
genes areiocated on the larg est chromo - 
somes, the cohalbumm gene is located 
on a chromosome of intermediate size, 
and the p- and p-globin genes are present 
on a small macro- or on a microchromo- 
some (23)] therefore, the major compo- 
nent in which all these genes are located, 
H2, is present on several chromosomes. 
Conversely, genes present in different 
major components may be located on the 
same chromosome. For example, the 
human Ha-ras 1 and p-globin genes, 
which belong to components H3 and L2 
respectively, are both located on chro- 
mosome 11. 

The distribution of genes and gene 
clusters within different major compo- 
nents is highly nonuniform. The data of 
Table 1 were obtained from study 0^34 
genes corresponding to 24 **loci" (de- 
fined here as isolated genes or gene 
clusters) and to 14 functionally unrelated 
proteins. About half of the loci examined 
for each genome are present in the heavi- 
est components (H2 or H3), which only 
represent 8 or 4 percent, respectively, of 
the DNA. 

Families of interspersed repeated se- 
quences are concentrated in some major 
components (15). For instance, the 
Bam HI family and the CR-1 (Alu-like) 
family are almost only present in the two 
light components of mouse (14) and in 
the heaviest component of chicken (76), 
respectively. 

Integrated viral sequences are only or 
mainly located in a given major compo- 
nent. The integrated sequences of bo- 
vine-leukemia virus (BLV) and hepatitis 
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Summary. Most of the nuclear genome of warm-blooded vertebrates is a mosaic of 
very long (»200 kilobases) DNA segments, the isochores; these Isochores are fairly 
homogeneous in base composition and belong to a small number of major classes 
distinguished by differences in guanine-cytosine (GO) content. The families of DNA 
molecules derived from such classes can be separated and used to study the genome 
distribution of any sequence which can be probed. This approach has revealed (1) that 
the distribution of genes, integrated viral sequences, and interspersed repeats is 
highly nonuniform in the genome, and (ii) that the base composition and ratio of CpG 
to GpC in both coding and noncoding sequences, as well as codon usage, mainly 
depend on the GO content of the isochores hartjpring the sequences. The composi- 
tional compartmentalization of the genome of warm-blooded vertebrates is discussed 
with respect to its evolutionary origin, its caiises, and its effects on chromosome 
structure and function. 
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B virus (HBV) from the Alexander cell 
line were almost only found in compo- 
nents H2 and H3, respectively; those of 
mouse mammary tumor virus (MMTV) 
were mainly found in component L2 of 
mice (24). 

The distribution of genes and inter- 
spersed repeats in the major components 
seems to be conserved in evolution. For 
instance, the a- and ^-giobin gene clus- 
ters, vimentin and c*abt genes are locat- 
ed in components identical or close in 
GC levels in different mammals. The 
same applies to specific families of inter- 
spersed repeats (14-16). 



Grae Composition and Codon Usage 

1, TTic GC (G. guanine: C. cvtosihe) con - 
tents of genes, exons, and introns a re 
Unearly jelatcd_to Jhose "oMhe major 
components~in which they are located 
(Fig. 4, A to C). The slopes of the lines 
representing these relationships are 
eqiial to 1.9 for genes, to 3.0 for introns, 
and to l.O for exons. While "light** genes 
arcj on the average, only slightly higher 
in GC content than light components, 
"heavy" genes have increasingly more 
GC than the corresponding heavy com- 
ponents (Fig. 4A). An increasing devi- 
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Fig. 1 (left). Histograms (based on data from 
S, 8, 9, 11) showing the relative amounts and 
buoyant densities of the msyor DNA Compo- 
nents from Cyprinus carpio, Xenopus laevis 
(left panel), chicken, mouse, and man (right 



panels). Satellite and minor components (namely, components representing each less than 3 
percent of DNA) (8) are not shown, with the exception of the minor components from mouse 
and chicken which have the buoyant density of H3; no genes have been localized so far in these 
iniiior components. Carp and Xenopus genomes represent extreme cases of low and high 
heterogeneity among cold-blooded vertebrates. Notice that even in Xenopus, DNA having a 
density higher than 1 .704 represents less than 10 percent of the genome, as compared with 30 to 
40 percent for warm-blooded vertebrates. Fig. 2 (right). Scheme depicting the mosaic 
organization of nuclear DNA from warm-blooded vertebrates. When the very Jong DNA 
segments, fairiy homogeneous in base composition, the isochores, undergo breakage during 
DNA preparation, four m^jor families of molecules having different GC contents are generated. 
These m^or DNA components can be resolved from each other by preparative density gradient 
centiifugation in the presence of certain DNA ligands. Component H3, minor, and satellite 
components are neglected in this scheme. If isochores correspond, as suggested [(S) and present 
work], to the DNA segments present in Giemsa and Reverse chromosomal bands as obtained at 
higli resolution (42), their average size is -1250 kb (^/). This means that the 30- to 100-kb DNA 
molecules from the preparations used in this work are 12 to 40 times smaller in size than 
isochores; DNA molecules bridging contiguous isochores are, therefore, as expected rare in our 
preparations (one such molecule, L2-H2, is shown). 



Fig. 3. A typical experiment for localizing a gene in the 
major components of a genomic from a warm-blooded 
vertebrate. A chicken ^-globin probe was hybridized to 
Eco Rl digests 6f unfractionated chicken DNA (T) and its 
major components LI, L2, HI, and H2. The probe 
revealed not only the 6-kb fragments carrying the ^-globin 
gene, but also 9.4 kb-fragment carrying the p-globin gene; 
both genes, which belong to the same cluster (Table 1), 
are Ideated in the H2 component. M is a size-marker 
restriction digest (48). An alternative approach for gene 
localization consisted in hybridizing appropriate cloned 
complementary DNA probes to restriction digests from 
DNA fractions obtained by preparative density gradient 
centrifugation in the presence of DNA ligands; fractions 
were then aiiaiyzed in CsCI density gradietits in order to 
assess their buoyant densities and assign them to m^jor 
components. In every case, the sizes of the hybridizing 
restriction fragments were in agreement with those al- 
ready published (references in Table 1). The DNA preparations submitted to fractionation had 
molecular sizes between 30 and 100 kb. ^^P-labeling of probes was done according to Rigby et 
al. {49). Fragment transfers from 0.8 to I percent agarose gels onto nitrocellulose, hybridiza- 
tion, and autoradiography were done as described (48). 
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ation from the unit slope is also exhibited 
by iiitrbns (Fig. 4B). In contrast, exons 
show a unit slope, but are about 10 
percent higher in GC, on the average, 
than the components in which they are 
located (Fig. 4C). The larger scatter of 
points exhibited by exons compared to 
introns and genes are probably due to 
their smaller sizes, but a few exons devi- 
ate from the common relationship. Final- 
ly, integrated viral sequences and long 
interspersed repeats seem to show a 
closer match in composition with the 
nugor components in which they are 
located, compared to genes (Fig. 4A). 

The higher GC content of heavy rela- 
tive to light exons is due to a dififerent 
codoh usage and not to the amino acid 
composition of the corresponding pro- 
teins. Indeed, if the codons used in 
heavy exons (53 to 67 percent GC) were 
replaced with the synonymous codons 
lowest in GC also used in the same 
exons, the GC contents of heavy exons 
would decrease to about 40 percent, a 
value as low as that of the lowest of light 
exons (40 to 55 percent GC), without any 
amino acid change. In fact, the lowest 
"allowed" GC leVel for heavy exons is 
practically identical to that of light ex- 
ons. A striking example of large differ- 
ences in the GC content of exons not 
accompanied by changes in amino acids 
is that of cardiac and skeletal mouse a- 
actin genes. These are located in L2 and 
H2 components, respectively, and differ 
by 8 and 16 percent in overall and third- 
position GC content, respectively (see 
below); yet, the corresponding proteins 
show only a 1 percent difference in ami- 
no acids (25). 

Since most of the synonymous codons 
differ in third positions, we should ex- 
pect that GC contents iii codon third 
positions are different for heavy and light 
exons. This expectation is borne out, the 
GC level of codon third positions ranging 
from 43 to 69 percent to 61 to 90 percent 
for the light and the heavy genes, respec- 
tively (Fig. 5A). A few genes show, 
however, a deviation from the general 
trend (see next section). 

Genes located in heavy components 
show a decreased discrimination against 
CpG doublets, which tend to be avoided 
in vertebrate genomes (26). In most cas- 
es CpG is strongly discriminated against 
in light exons. but only slightly in heavy 
exons (Fig. 1, right). As would be 
expected, most of the differences be- 
tween heavy and light exons concern 
intercodon CpG, namely doublets in 
which third-position C is followed by 
first-position G. Intracodon differences, 
namely preferential usage of codohs con- 
taining CpG instead of synonymous co- 
science. VOL. 228 
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Fig. 4. Plot of the GC contents of (A) genes, viral and long interspersed repeated sequences, (B) introns, and (C) exons against the GC levels and 
the buoyant densities of DNA components in which they are located. The numbers indicate genes (see Table 1). The line was drawn using the 
least-square method. The unit slope line corresponds to the coincidence in GC contents of genes and major components in which genes are 
located. 



dons not containing CpG, are, however, 
also found in heavy genes (not shown). 
Diflferences in CpG levels similar to 
those of exons are also found in the 
introns of the corresponding genes and in 
the untranslated regions (data not 
shown). 

If the composition of genes and the 
codon usage niles (as discussed above) 
are generally valid, genes from any 
warm-blooded vertebrate should fall into 
compositional plasses such as those 
found for genes located in different com- 
ponents (Fig. 4A), and these classes 
should, in tiim, largely determine codon 
usage. Both the first and the second 
prediction are fulfilled (Fig. 6B), proving 
the general validity of the "rules." Two 
additional points made by Fig. 6 are that 
human genes (as well as those of other 
warm-blooded vertebrates tested) are 
predominantly heavy (Fig. 6B) although 
less so than indicated by the smaller gene 
sample of Table 1, and that, in contrast, 
tight genes predominate in the light ge- 
nomes of cold-blooded vertebrates, as 
would be expected (Fig. 6A). In this 
second case too, GC content in third 
positions of codons and CpG/GpC ratios 
are correlated with overall GC content 
(not shown). 



Implications 

The mosaic genome organization dis- 
cussed so far is typical of warm-blooded 
vertebrates. When the genornes of cold- 
blooded and warm-blooded vertebrates 



are compared with each other, it is clear 
that the main difference concerns the 
presence of abundant, heavy compo- 
nents in the latter (Fig. 1). As was just 
mentioned, this difference is accompa- 
nied by a predominance of heavy genes 
in warm-blooded vertebrates, and of 
light genes in cold-blooded vertebrates. 
These findings raise th^ question of the 
evoliitionary origin of the heavy compo- 
nents present in the genome of warm- 
blooded vertebrates. 

The evolutionary origin of the heavy 
components of the genome of warm- 
blooded vertebrates may be visualized as 
being due to (i) regional increases in GC 
content of preexisting light sequences; 
(ii) amplification of preexisting heavy 
sequences; (iii) de novo formation of 
sequences. While there is no evidence, 
so far, in favor of the latter process, 
(which probably is operational in thie 
generation of the clustered short repeats 
of satellite DNA*s), the other two are 
well-documented. The first one is sup- 
ported by our finding that light genes, 
ancestrally present in the light genomes 
df cold-blooded vertebrates (Figs. 1 and 
6), have become heavy and are found in 
the heavy components of \yarm-blooded 
vertebrates (Fig. 6). For instance, the p- 
globin gene is light in Xenopus but heavy 
in chicken (Fig. 4G), and the insulin gene 
is light in hagfish (45 percent GC) but 
heavy in man (64 percent GC). The sec- 
ond process is exemplified by the ampli- 
fication of heavy Alu sequences in mam- 
malian genomes. The large difference in 
copy number of mouse and human Alu 



sequences (27) indicates that the amplifi- 
cations of such interspersed repeats 
were recent events compared to the for- 
mation of heavy isochores, 

The molecular mechanisms underlying 
these processes are different. The ampli- 
fication of preexisting heavy Alu re- 
peats, like that of interspersed repeats in 
general, implies rounds of duplication 
and insertion events. The specific 
genome distribution of different families 
of interspersed repeats indicates that 
such insertion events were targeted to- 
ward isopycnic isochores, namely iso- 
chores of matching GC content, or that 
the insertion stability was depeiident 
upon such a correlation (or both). It 
should be mentioned here that the viral 
sequences explored exhibit the same 
phenomenon. 

In contrast, the main process responsi- 
ble for the formation of heavy isochores, 
namely the regional increase in GC, was 
brought about by (i) point mutations in 
coding sequences, mainly in third posi- 
tions (Fig. 5A); or (ii) point mutations, 
deletions, and additions in introns (28). 
This process raises two questions con- 
cerning, respectively, its causes and its 
effects. 

The causes for the regional GC in- 
creases are unknown at present, but they 
might only or niainly be of a structural 
nature, since they affect both coding and 
noncoding sequences to comparable ex- 
tents. These causes mjght be related to 
the requirements of chromosome struc- 
ture at the temperature prevailing in the 
cells of warm-blooded vertebrates. Addi- 
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Fig. 5 (left). Plot of GC levels of third posi- 
tions of codons (A) and ratios of CpG to GpC 
(B) for genes and exons against the GC con- 
tents of DNA components harboring the 
genes. Other indications as in Fig. 5. The 
cluster of unnumbered genes in B comprises 
genes 22 and 25 to 29. . Fig. 6 (right). Plot 
of number of genes or gene clusters in 3 
percent GC intervals (N) against GC of corre- 
sponding complementary DNA's. Primary 
data were from the EMBL (European Molec- 
ular Biology Laboratories) library (April 
1984). Average GC values were taken for gene 
clusters. Short DNA sequences (<100 bp) 
were not taken into account. Arrows refer to 
GC contents of exons studied (Fig. 4C; numbers refer to genes; see Table 1). Results for loci 
from cold-blooded vertebrate (A) and from human genome (B) are shown. This latter plot tests 
the prediction that genes from warm-blooded vertebrates fall into compositional classes shown 
in (B). Since the information on genes available in data banks is too limited, this test was done 
with exons. A comparison of Fig. 4, A and C, indicates that such a choice has two drawbacks, 
namely (i) that exons corresponding to a given component show a larger GC scatter than genes 
and (ii) that exons corresponding to different components show a smaller difference in GC than 
genes. In other words, the use of exons instead of genes minimizes differences between 
compositional classes. 

Table 1. Localization of some sequences in the major DNA components of warm-blooded 
vertebrates. References for gene sequences and hybridization results are given iri parentheses; 
GC content of genes, exons, introns, codon third bases, and ratios of CpG to GpC (see Figs. 4 
and 5) were calculated from these data. Nonstandard abbreviations: pM, ^ niajor; pm, p minor; 
ac, a cardiac; as, a skeletal; p-omc, pre- pro-op iomelanocortin. Sequences were localized in 
separated major components or in preparative density gradients as indicated (see Fig. 3 legend). 
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L2 


37. BLV (97) 


H2t 


18. Ig'' constant (66) 


L2 


38. HBV (92. 93) 
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LI 


39. MMTV (94) 


L2t 
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*Xenopus a- and p-globin genes arc clustered; chicken a*- and o'*-globin, P- and p-globin, ovalbumin, Y , and 
X genes are clustered within 4.4, 10, and 40 kb, respectively (96-98); mouse p-major and 6-minor globin genes 
are clustered within 16 kb (99), and human p,7,6, and e globin arc clustered within 42.8 kb (70). tFrom 
preparative BAMD-CsjSO^ density gradients. tin the case of vimentin a hamster DNA probe was used 
and sequence data are for the hamster vimentin gene. The localization of c-myc win be discussed 
elsewhere. SScc text. 



tiohal explanations are needed, howev- 
er, to account (i) for the preferential 
distribution of genes in the heavy DNA 
components (Table 1 and Fig. 6), namely 
for a distribution which requires the larg- 
est GC change in ancestrally light genes 
and also the highest increase in CpG 
doublets; and (ii) for the higher GC level 
of exons and introns compared to that of 
the components in which they are locat- 
ed (Fig. 4, B and C). These features 
might be associated with the best protec- 
tion of genes against DNA * 'breathing" 
and mutability in warm-blooded verte- 
brates (29), but there may be other 
causes as well. 

The effects of the regional GC in- 
creases are twofold. First, a different 
codon strategy is used for different genes 
located in the same genome. This has 
been previously noted for rabbit (30) and 
human (31) a- and p-globin genes, and 
for human and mouse a-actin genes (32). 
What is shown here for the first time, 
however, is that a different codon usage 

(i) is the rule for different classes of 
genes from the same warm-blooded ver- 
tebrate genome; and (ii) is mainly deter- 
mined by the location of genes in heavy 
or light isochores. This compositional 
constraint predoiiiinates over other con- 
straints (33-38), which may also be oper- 
ational, and be responsible for the devi- 
ations of some genes from the general 
relationship (Figs. 4C and 5 A). 

Second, heterogeneity in DNA com- 
position is associated with chromosomal 
G or R banding. The identification of 
isochores with the DNA segments pre- 
sent in G or R bands was previously 
suggested (8) on the basis of (i) indica- 
tions (39) that G bands correspond to 
AT-rich, late-replicating DNA and R 
bands to GC-rich early -replicating DNA; 

(ii) the observation (8) that the increase 
in the heterogeneity of DNA composi- 
tion when moving from cold-blooded to 
warm-blooded vertebrates (5) is paral- 
leled by an increased G and R banding; 
and (iii) the parallel evolutionary conser- 
vation of isochores (as judged by the 
location of specific sequences), and of 
chromosomal bands (40). It should be 
recalled here (i) that, as expected, differ- 
ent isochores are represented on the 
same chromosome and the same iso- 
chore is represented on different chro- 
mosomes; and (ii) that the estimated 
average size of isochores (>>200 kb) is 
compatible with that of chromosomal 
bands (-1250 kb) (41) for the more than 
2000 bands obtained at high resolution 
(42). This notion is effectively reinforced 
by (i) the confirmation of the first two 
points mentioned above by recent results 
obtained both in our and in another labo- 
ratory (41)\ (ii) the fact that gene amplifi- 



cation leads to the appearance of homo- 
geneous staining regions in chromo- 
somes (43), as expected if the genome 
segments which are amplified are smaller 
than isochores; (iii) the presence, in ear- 
ly replicating DNA, namely in R bands, 
of genes (44-^46) which are located in the 
heaviest component (human c-Ha-ras 1 
and a-globin genes and the mouse a- 
globin gene) and the presence in late- 
replicating DNA, namely in G bands, of 
genes {46) that are located in the lightest 
components (human p-globin gene); (iv) 
the parallelism between the preferential 
distribution of genes in GC-rich iso- 
chores found here and the paucity of 
genes found in G bands {4 J). 



Conclusions 



The investigations reported in this arti- 
cle show that the compositional com- 
partmentalization of the genome of 
warm-blooded vertebrates (i) largely dic- 
tates the base composition of genes and 
their codon usage; and (ii) plays a role in 
the timing of DNA replication and in the 
targeting of integration of mobile and 
viral sequences. From a more general 
viewpoint, it should be stressed that 
compositional compartmentalization (i) 
has an extremely wide evolutionary 
range, going as far as the mitochondrial 
genome (47); (ii) shows different patterns 
in different organisms, as exemplified 
here by cold-blooded and warm-blooded 
vertebrates; and (iii) plays a general role 
in genome structure and function; in- 
deed, the different GC levels of iso- 
chores, their different ratios of CpG to 
GpC, and the accompanying differences 
in potential methylation sites are bound 
to be associated with differences in DNA 
and chromatin structure, and possibly, 
with differences in the regulation of gene 
expression. 
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Expression of Plasmodium falciparum 
Circumsporozoite Proteins in 
Escherichia coli for Potential Use in 
a Human Malaria Vaccine 
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W. Ripley Ballou, Robert A. Wirtz, James H. Trosper 
Richard L. Beaudoin, Michael R. Hollingdale 
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Thefeasibility of immunization against quence Asn-Ala-Asn-Pro interspersed 

the sporozoite stage of malaria has been with four tetrapeptides with the se- 

established. Irradiated sporozoites have quence Asn-Asp-Val-Pro. This general 

been used to immunize and protect both structure is analogous to that of the CS 

man and animals (/). This protection is protein of the simian malaria parasite P, 

correlated with antibody to a protein on knowlesi (5), although the overall se- 



Abstract. The circumsporozoite (CS) protein of the human malaria parasite 
Plasmodium falciparum may be the most promising target for the development of a 
malaria vaccine. In this study, proteins composed of 16, 32, or 48 tandem copies of a 
tetrapeptide repeating sequence found in the CS protein were efficiently expressed in 
the bacterium Escherichia coli. When ir\jected into mice, these recombinant products 
resulted in the production of high titers of antibodies that reacted with the authentic 
CS protein on live sporozoites and blocked sporozoite invasion of human hepatoma 
cells in vitro. These CS protein derivatives are therefore candidates for a human 
malaria vaccine. 



the surface of the sporozoite — circum- 
sporozoite (CS) protein (2-6), Monoclo- 
nal antibodies (Mab's) to the CS protein 
block infection with sporozoites in vitro 
and protect animals in vivo {3, 4, 6). 

Recently, Dame et al. (7) cloned and 
sequenced the complete CS gene of the 
human malaria parasite Plasmodium fal- 
ciparum. The gene encodes a protein of 
412 amino acids. This protein has a se- 
quence typical of a membrane protein 
with an NH2-terminal signal peptide and 
a COOH-terminal anchor domain. The 
most striking feature of this polypeptide 
is a large central repeat domain com- 
posed of 37 teU-apeptides with the se- 



quence homology between the CS pro- 
tein of P. falciparum and P. knowlesi is 
very low. In fact, only two regions of 
approximately 15 amino acids each, in 
the charged sequences flanking the re- 
peat domain, are conserved (7), 

Protection by Mab's to the CS protein 
is both species- and stage-specific and, in 
the case of P. knowlesi, Mab's react with 
the 12-amino-acid repeat region of the 



CS protein (9). These Mab's also bh 
the binding of polyclonal antisera to 
CS protein in a radioimmunometric 
say (10). Thus. Zavala et al. (W) p 
posed and Dame et al. (7) confirmed t 
the repeat domain was the immu 
dominant region of the CS protein. F 
different Mab's to the CS protein of 
falciparum recognized synthetic p 
tides of various lengths corresponding 
portions of the repeat region (7), 1 
immunodominant repeat region mi 
thus form the basis for a malaria vacc 
(7, 11). That such a vaccine would be 
widespread use is indicated by the fi 
ing that the CS gene is highly conser 
in P. falciparum isolates from many g 
graphic areas (12). Here we describe • 
efforts to develop a vaccine against 
sporozoite stage of P. falciparum us 
proteins containing tandem repeats 
the CS tetrapeptide sequence produt 
in Escherichia coli. 

Expression of the P. falciparum 
protein in E. coli. A recombinant p 
mid (pUC8 clone 1) containing the I 
RI insert from \mPf I (7) was the sou 
of the gene encoding the P. falcipat 
CS protein. This 2337 base pair ( 
fragment contains the entire CS g 
(Fig. lA). Since the sequence of the I 
16 amino acids of the CS protein 
characteristic of a cleaved signal pepi 
(7), these amino acids are presuma 
absent from the mature CS protein 
sporozoites. Restriction endonucle 
Stu I cleaves the CS gene in the 1 
codon of the sequence. Thus, a 1216 
Stu I-Rsa I fragment from pUC8 clor 
(Fig. I A) should encode all but the 1 
two amino acids of the mature CS i 
tein predicted from the sequence 
This fragment was isolated and lig£ 
into the XPL E. coli expression plasn 
pASl (Fig. IB) {13, 14), which had b 
cut with Bam HI and U-eated with D 
polymerase to create a blunt-end. In 
resulting plasmid, pCSP, the coding 
gion of the CS protein is fused, in frai 
to the translation initiation codon a* 
cent to the Bam HI site in pASl (73, . 
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